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Bayesian  Analysis  of  the  Independent  Multinormal  Process--Neither 
Mean  nor  Precision  Known 
By  Albert  Ando  and  G.M.  Kaufman 
SUMMARY 

Under  the  assumption  that  neither  the  mean  vector  nor  the  variance- 
covariance  matrix  are  known  with  certainty,  the  natural  conjugate 
family  of  prior  densities  for  the  multivariate  Normal  process  is 
identified.   Prior-posterior  and  preposterior  analysis  is  done 
assuming  that  the  prior  is  in  the  natural  conjugate  family.  A 
procedure  is  presented  for  obtaining  non-degenerate  joint  posterior 
and  preposterior  distributions  of  all  parameters  even  when  the 
number  of  objective  sample  observations  is  less  than  the  number 
of  parameters  of  the  process. 

1 .    Introduction 

In  this  paper  we  develop  the  distribution  theory  necessary  to  carry  out 
Bayesian  analysis  of  the  multivariate  Normal  process  as  defined  in  section  1.1 
below  when  neither  the  mean  vector  nor  the  variance-covariance  matrix  of  the 
process  is  known  with  certainty.   The  development  here  generalizes  Raiffa's  and 
Schlaifer's  treatment  of  the  multivariate  Normal  process  as  done  in  Part  B, 
Chapter  12  of  [5],  in  which  it  is  assumed  that  the  variance-covariance  matrix  is 
known  up  to  a  particular  multiplicative  constant.   We  drop  this  assumption  here. 

In  section  1  we  define  the  process,  identify  a  class  of  natural  conjugate 
distributions,  and  do  prior  --  posterior  analysis.   The  conditional  and  uncondi- 
tional sampling  distributions  of  some  (sufficient)  statistics  are  presented  in 
section  2.   In  particular,  we  prove  that  the  distribution  of  the  sample  mean 
vector  marginal  with  respect  to  the  sample  variance-covariance  matrix,  to  the 
process  mean  vector,  and  to  the  process  variance-covariance  matrix  is  multivariate 
Student  whenever  the  prior  is  in  the  natural  conjugate  family  of  distributions.   W< 
then  use  the  results  of  sections  1  and  2  to  do  preposterior  analysis  in  section  3. 

1751889 


,3  '"  ea  4544 


-  2  - 

We  also  show  in  section  3  that  Bayesian  joint  inf erence--f inding  joint 
posterior  and  preposterior  densities  of  the  mean  vector  and  the  variance- 
covariance  matrix--is  possible  even  when  classical  joint  inference  is  not,  i.e. 
when  the  number  of  objective  sample  observations  is  less  than  the  number  of 
distinct  elements  of  the  mean  vector  and  variance-covariance  matrix  of  the  process 

Geisser  and  Cornfield  [3]  and  Xiao  and  Zellner  [7]  analyze  the  multi- 
variate Normal  process  and  multivariate  Normal  Regression  process  respectively 
under  identical  assumptions  about  the  state  of  knowledge  of  the  parameters 
of  the  process.   Their  presentations  differ  from  that  given  here  in  three 
respects:   first,  following  the  lead  of  Jeffreys  [4],  both  sets  of  authors 
assume  that  the  joint  prior  on  the  mean  vector  and  variance-covariance  matrix 
is  a  special  (degenerate)  case  of  a  natural  conjugate  density;  second,  here 
we  find  sampling  distributions  unconditional  as  regards  the  parameters  of  the 
process  and  do  preposterior  analysis;  third,  by  doing  the  analysis  for  the 
complete  natural  conjugate  family,  we  are  able  to  provide  a  procedure  for 
deriving  joint  posterior  and  some  joint  preposterior  distributions  of  all 
parameters  under  conditions  mentioned  in  the  paragraph  immediately  above. 


1.1  Definition  of  the  Process 

As  in  [1],  we  define  an  r-dimensional  Independent  Multinormal  process  as  one 

~(1)     ~(1) 
that  generates  independent  r  x  1  random  vectors  x   , . . . ,x^      , . . .   with  identical 

densities 

-  <  X  <  ^^> 

-00  ^  X  *v  00  ^ 

-00  <  n  <  oo  , 

h  is  PDS 

1.2  Likelihood  of  a  Sample 

The  likelihood  that  the  process  will  generate  n  successive  values 
x<l>,...,x<J>,...,x<">  is 

(2„)-i^"  e-i^(2S^^^-ii>'^(i^^^-li>  |h|*"   .  (2) 

If  the  stopping  process  is  non-informative,  as  defined  in  [1],  this  is  the 
likelihood  of  a  sample  consisting  of  n  observations  x   ,  •  ♦  •  ,21   ;  •  •  •  ;2i  '  • 
When  neither  h  nor  y.  is  known,  we  may  compute  these  statistics: 

ms  -  E  x'-^^      ,   V  s  n-r  (redundant)   ,  (3a) 

—   n   — 

and 

Vs  E(x^^^-m)(x^j^-m)''    .  (3b) 

It  is  well  knownithat  the  kernel  of  the  joint  likelihood  of  (m,  y)  is, 

provided  v  >  0, 

e-in(iE!-y.)'h(m-H)  |h|i  .  e-^"^"^  ^  I    Ihl^^""""'^^    ,  (4a) 

the  kernel  of  the  marginal  likelihood  of  m  is,  provided  v  >  0, 

^-in(m-ji,)  h(m-ji)  u  i^-       ^  (4b) 

and  the  kernel  of  the  marginal  likelihood  of  V  is,  provided  v  >  0,  and  V 

is  PDS, 


''see  for  example,  Anderson  [1],  Theorem  3.3.2  and  pp.  154-160. 
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-itr  h  V  |,  |i(v+r-l) 

Formula  (4c)  is  the  kernel  of  a  Wishart  distribution, 

A  random  matrix  ^  of  dimension  (r  x  r)  will  be  called  "Wishart  distributed 
with  parameter  (h,  v)"  if 

rw(r,  V)  |V|2V-1  ^-itr  h  y  |h|i(v+r-l) 

V  ~  f^'^^Ylb,  V)  =  )  if  V  is  PDS  and  V  >  0,          (4d) 

I        0  otherwise, 
where 

w(r,  V)  .  [2^<V+r-l)r  ^r(r-l)/4  _J^  ^^^^^^^..^^yl    .            (,,) 

That  (m,  V,  v)  defines  a  set  of  sufficient  statistics  for  {^^,   h)  is  shown 
in  section  3.3.3  of  [1]. 

We  will  wish  to  express  (4a)  in  such  a  fashion  that  it  automatically  reduces 
to  (4b)  when  only  (m,  n)  is  available  and  to  (4c)  when  only  (V,  n)  is  available. 
In  addition,  we  will  wish  to  treat  the  cases  that  arise  when  V  is  singular.   Hence 

we  define 

V  <  0 
if  ,  (5a) 

V  >  0 

V  is  non-singular 
if  ,  (5b) 

otherwise 


and 


n  >  0 
if  .  (5c) 

n  =  0 


In  terms  of  (5a),  (5b)  and  (5c)  we  may  rewrite  (4a),  (4b)  and  (4c)  as 
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.4n(m-ii)'^h(m-y_)  ,^1^6  ^-^tr  h  V*  n  |i(v-<I>+r-l) 

^-i-n(m-^i)  h(m-^i)  k  |i-6 

.4tr  h  V*  |^|^(v-<l)+r-l) 


(4a') 
(4b') 
(4c') 


Notice  tliaL  (na)  is  now  defined  even  when  v  <  0. 
By  adopting  the  convention  that 

(1)  V*  =  0  and  <t'=n-l  when  V  is  unknown  or  irrelevant,  and 

(2)  n=0  when  m  is  unknown  or  irrelevant, 

the  kernel  (4a')  reduces  to  (4b')  in  the  first  case  and  to  (4c')  in  the  second, 

1.3   Conjugate  Distributions  of  (ji,  h) ,  ^  and  h 

When  both  ^   and  h  are  random  variables,  the  natural  conjugate  of  (4a')  is 

(r)     I 
the  Normal -Wishart  distribution  f\^   (^,   !}IE'  ^^  ^'   ^^    defined  as  equal  to 

k(r,  V)  e-2"<li-B)'Mli-il})  |h|i5  e-i^''  ^   T  |v*|i(^+^-l)  |h|i^-l 


f^^\ii|m,  hn)  f^^\hJY,  v)    if  n  >  0  and  v  >  0  , 

0  otherwise  , 

where  V*  and  5  are  defined  as  in  (5),  and 


(6a) 


k(r,  v)  =  (2n)'2^  ns'^^  w(r,  v)  .  (6b) 

If  (6a)  is  to  be  a  proper  density  function  V  must  be  PDS,  v  >  0  and  n  >  0 
so  that  in  this  case  6=1  and  V*=V  is  PDS.   We  write  the  first  expression  in 
(6a)  with  5  and  V*  so  that  formulas  for  posterior  densities  will  generalize 
automatically  to  the  case  where  prior  information  is  such  that  one  or  more  of 
these  conditions  hold:   y  is  singular,  n=0,  v=0. 

We  obtain  the  marginal  prior  of  ^  by  integrating  (6a)  with  respect  to  h; 
if  V  >  0,  n  >  0,  Y  is  PDS,  and  we  define  g^=vnY   ,  then 
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D(ii.|m,  V,  n,  V)  =  fg^^'^iilm,  y^,  v)  OC  [v  +  (ii.-m)*'H^(ii-m)  j"^^^"^''^  (6c) 

This  distribution  of  ^  ^s  the  non-degenerate  multivariate  Student  distribution 
defined  in  formula  (8-26)  of  [5] J 

Proof:   We  integrate  over  the  region  R  =  (h|h  is  PDS } 

D(]ilE>  1>    ".  V)  =j  4''^tL|E>  hn)  f^J^'^hl^,  v)dh 

00       P  ^-itr   h(n(y_-ra)(ii-m)''+V)  j  ^^  |i(v-fe)-l  ^^ 

% 

But  as  the  integrand  in  the  integral  immediately  above  is  the  kernel  of  a  Wishart 
density  with  parameter  (n(|i-m)  (j^-m)   +  V,  v-l^)  , 

D(ix|m,  V,  n,  v)  CC  [1  +  {y^-m)'' (nf^  (y^-m)]'^^'''^'^'''^^      . 

Provided  that  v  >  0,  V  is  PDS  and  n  >  0,  H  =vny'   is  PDS,  6=1,  and  we  have  (6c). 

If  V  >  0  but  V  is  singular,  V*"   does  not  exist  so  neither  does  the  marginal 
distribution  of  y_.      And  if  n=0  the  marginal  distribution  of  ja  does  not  exist. 

Similarly,  we  obtain  the  marginal  prior  on  h  by  integrating  (6a)  with 
respect  to  jj.   If  v  >  0,  n  >  0  and  V  is  PDS,  then 

D(h|m,  V,  n,  V)  =  fw''\h|y,  v)  CC  e"^"  =  =  Ib]^""^   .  (6d) 

If  a  Normal-Wishart  distribution  with  parameter  (m' ,  V' ,  n' ,  v')  is 
assigned  to  (^,  h)  and  if  a  sample  then  yields  a  statistic  (m,  y,  n,  v)  the  posterii 
distribution  of  (^,   K)  will  be  Normal-Wishart  with  parameter  (m",  V*",  n",  v") 


TCornfield  and 


i  Geisser  [   ]  prove  a  similar  result:   if  the  prior 
on  (n,  b)  has  a  kernel  |b|2V  -1,  v'  >  0,  and  we  observe  a  sample  which  yields  a 
statistic  (m,  V,  n,  v),  V  >  0,  then  the  marginal  posterior  distribution  of  y.   is 
multivariate  Student. 
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where 

rl    n"  >  0 
if         ,  m"  =  n"'-^(n'm'-Hi  m)  ,    (7a) 


v"=v'  +v+r+6+5'-&"-*-l,  (7b) 

^V'  +  V  +  n'm'm''^  +  n  m  m'^  -  n"m"m"*^  e  f   if  y"  is  PDS 
(  0  otherwise 

Proof:   When  V'  and  V  are  both  PDS,  the  prior  density  and  the  sample  likelihood 
combine  to  give  the  posterior  density  in  the  usual  manner.   When  either  v'  or 
y  or  both  are  singular,  the  prior  density  (6a)  or  the  sample  likelihood  or  both 
may  not  exist.   Even  in  such  cases,  we  wish  to  allow  for  the  possibility  that 
the  posterior  density  may  be  well  defined.   For  this  purpose,  we  define  the 
posterior  density  in  terms  of  V'  and  y  rather  than  y'*  and  y*.   Thus,  multiplying 
the  kernel  of  the  prior  density  by  the  kernel  of  the  likelihood,  we  obtain 
e4n'(li-E'>'h(i,-m')  ^^^^b'    ^^tr  h  V'  |h|iv'-l 

.  g-in(E-it)''h(m-ii)  .^,i&   g4tr  h  V  n^|i(v+r-<t-l)  (7d) 

4S  ,,|i&"   4tr  h(V'+V)  |,  |i(v'+v+r46'+6-6"-<l)-l)-l 
=   e  '^      \n\  ^  ~   -      =   |[}| 

where 

S  s  n'(ji-m)  h(y.-m')  +  n(m-y,)  h(m-y.) 

Since  h  is  symmetric,  by  using  the  definitions  of  (7a),  we  may  write  S  as 

(y^-m")'^(hn")(ti-m")  -  m"'^(hn")m"  +  m'"^(hn')m'  +  m'^(hn)m   . 

Now,  since 

m'*^(hn')m'  +  m^(hn)m  -  m"'^(hn")ni"  =  tr  h[n'(m'm'*^)  +  n(m  m*^)  -  n"(m"in"'oJ  , 


by  defining  v"  as  in  (7b),  V"  and  V*"  as  in  (7c),  we  may  write  the  kernel  (7d)  as 
g-^(ti-m")*^(hn-)(ii-rn")  ,^|^6"  ^-^tr  h  V*"  |h|iv"-l 

which  is  the  kernel  of  Normal-Wishart  density  with  parameter  (m",  V",  n",  v") • 

We  remark  here  that  a  prior  of  the  form  (6a)  lacks  flexibility  when  v  is 
small  because  of  the  manner  in  which  this  functional  form  interrelates  the 

distribution  of  ^  and  h.   The  nature  of  this  interrelationship  is  currently  being 

examined,  and  will  be  reported  in  a  later  paper." 
2.    Sampling  Distributions  with  Fixed  n 

We  assume  here  that  a  sample  of  size  n  is  to  be  drawn  from  an  r-dimensional 

Independent  Multinormal  process  whose  parameter  (^,    h)  is  a  random  variable  having 

a  Normal-Wishart  distribution  with  parameter  (m' ,  V' ,  n',  v') 

2.1  Conditional  Joint  Distribution  of  (m,  y\y^,   h) 

The  conditional  joint  distribution  of  the  statistic  (m,  y)  given  that  the 
process  parameter  has  value  (ji,  h)  is,  provided  v  >  0, 

D(m,  V|ii,  h,  V)  =  f^'^^Elii.  hn)  f^^'^vlh,  v)  (8) 

as  shown  in  section  1. 

2.2  Siegel's  Generalized  Beta  Function 

Siegel  [6J  established  a  class  of  integral  identities  with  matrix  argument 
that  generalize  the  Beta  and  Gamma  functions.   We  will  use  these  integral 
identities  in  the  proofs  that  the  unconditional  sampling  distributions  of  m,  of 
V  and  of  (m,  V)  are  as  shown  in  sections  2.3,  and  2.4,  and  2.5.   (In  fact,  the 
integrand  in  Siegel's  identity  for  the  generalized  Gamma  function  is  the  kernel 
of  a  Wishart  density.) 


'Ando,  A.  and  Kaufman,  G.,  Extended  Natural  Conjugate  Distributions  for  the 
Multinormal  Process. 


Let  X  be  (r  X  r)  and  define 

rj.(a)  =  n'^^'^-^)/^  r(a)  r(a-i)...r(a-^)  (9a) 

r^(a)r  (b) 

«r(-^  ^)  =  r  (a4b)  <^^> 

r 

where  a  >  (r-l)/2^  b  >  (r-l)/2.   Siegel  established  the  following  integral 

identities:   letting  R^^  =  [^|^  is  PDSj, 

J TTT-   dX  =  B^(a;  b)  .  (9c) 

Defining  Y=(I+X)"  X,  letting  V  and  B  be  real  symmetric  matrices  and  letting 
V  <  Y  <  B  denote  the  set  (Y||-Y,  Y-V  are  PDS], 

r  |Y|a-i(r+l)  I  j_Y|b4(r+l)  ^^  =  B^(a,  b)  (9d) 

where  the  domain  of  integration  Ky  =    {Y|Q  "^  I  *^  i^* 

We  shall  define  the  standardized  generalized  Beta  density  function  as 

f(^>(Y|a,  b).B;^(a,  b)  |,|-i<^-^^)  ll-.f-i'^-'^    , 

a  >  i(r-l)  , 
b  >  i(r-l)  , 

Similarly  the  standardized  inverted  generalized  Beta  density  function  is  defined  as 

ia-i(r+l) 


f^gi  (X|a,  b)^B;^a,  b)  \l 


i+xl^-^^ 


a  >  i(r-l)  , 

b  >  i(r-l)  ,  (9f) 
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We  also  define  the  non-standardized  inverted  generalized  Beta  density  function 

f-B  (Y|a,  b,  C)  £  [^;\a,   b)  |Q|^]  }^ , 

=  I    .a+b 

|v+q| 

a  >  i(r-l)  ,  (9g) 

b  >  i(r-l)  , 

C  is  PDS    , 

y  e  Ry  =  {^|y  is  PDSJ. 

The  functions  (9e)  and  (9f)  are  related  via  the  integrand  transform 
Y=(I+^)"  ^,   which  has  Jacobian  J(Y,  ^)=|l-Xr     •   The  function  (9g)  may  be 
transformed  into  (9f)  by  an  integrand  transform  I  ^  |  =V,  where  J  is  a  non- 
singular  upper  triangular  matrix  such  that  |  J  =Q. 

2.3   Unconditional  Joint  Distribution  of  (m,  ^) 


The  unconditional  (with  respect  to  (^,   h))  joint  distribution  of  (m,  y)  has 

density 

D(m,  V|m' ,  V' ,  n',  v' j  n,  v) 

=  f  J^'^hBln,,    bn)  f^^^V|h,  V)  f^\tx,  h|mSv',n',v')di,  dh    (10a) 
R  R, 

where  the  domain  of  integration  R  of  ^  is  (-»,  -tro)  and  R^^  of  h  is  {h|h  is  PDSj. 
It  follows  that^  provided  v  >  0,  v'  >  0^  and  n'  >  0, 

D(m,  V|m',  y',  n';  n,  v)*^  "    ,Uv"+r-l)  ^^^^^ 

where 


_  .      clllU  \^      =z     Li      \^ui-m   /  v^  " 

'u   n '  +n 


„   -  £>-   ,    and  C  =  n  (m-m')(m-m')   +  y'   . 
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Proof:   Using  (4a')  and  (8)  we  may  write 
D(m,  V|m' ,  n' ^  v' j  n,  v) 

=  J  [  J  f^^^h^l^,    hn)  fj'^^iilm',  hn')dy^]  .  f^''\v|h,v)  fi''\h|v',  v')dh 

(v) 
The  inner  integral  is  f^   (eIe'^  ti"  )•   Hence  the  total  integral  may  be  written  as 

J^'^^iBllH'^   K>  ^''^^HlS.  V)  fi^'^hlY',  v')dh 

which  upon  replacing  the  f   and  f„'s  by  their  respective  formulas  and  dropping 

N      w 

constants  becomes 

iv-1   P  4n  (m-m')'^h(m-m')     4tr  h(V'+V)   i(v'+v+r+64«' -&"-a>-l) -1 

|v|'    /«        ^"~   •  ^         "  ~~    "   1^1 

s 

Using  the  definitions  of  (7),  this  equals 

iy-l  r    -^tr  h(n  (m-m' )  (m-m' )*^+y'+V)    \v"-\ 
IVI        e        u  --   --    =  =   |h|     dh    . 


dh 


Letting  B=n  (m-m' )  (m-m' ) '■+y+V'  from  (4c)  we  see  that  the  integrand  in  the  above 

MX  I  |i(v"+r-l) 
integral  is,  aside  from  the  multiplicative  constant  w(r,  V  )  |B|         ,  a 

Wishart  density  with  parameter  (B,  v") .   Hence,  apart  from  a  normalizing  constant 

depending  on  neither  V  nor  |, 

proving  (10b) . 
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2.4  Unconditional  Distribution  of  m 

The  kernel  of  the  unconditional  (with  respect  to  (^,    h)  distribution  of  m 
can  be  found  three  ways:   by  utilizing  the  fact  that  m-^  and  ^  are  conditionally 
independent  given  h=h  and  then  finding  the  unconditional  distribution  of  m  as 
regards  h.  By  integrating  D(m,  YIb' '    "'^  ^' i   ^>   v)  over  the  range  of  V  when 
this  distribution  exists  i.e.  over  R„  =  (V|y  is  PDS};  or  we  may  find  it  by 
integrating  the  kernel  of  the  marginal  likelihood  of  m  defined  in  (4b)  multiplied 
by  the  kernel  of  the  prior  density  of  (^,  h)  over  R  and  R,  .   The  first  and  third 
methods  have  the  merit  of  in  no  way  depending  on  whether  or  not  V  is  singular-- 
that  is,  even  if  v  <  0  (n  <  r) ,  the  proofs  go  through,  which  of  course  is  not 
the  case  if  we  proceed  according  to  the  second  method.   We  show  first  by  the 
second  method  that  when  v  >  0,  v'  >  0,  and  V*'=V'  is  PDS , 

D(m|m',  V',  n',  v';  n,  y)  =  /  D(m,  V|m' ,  V' ,  n',  v';  n,  v)dV 

Ry  (Ua) 

OC   [1  +  n^(m-m')''v'"^  (m-m' )  ]  "2  ^^  '  "^"^^ 

We  then  show  by  the  first  method  that  this  result  holds  even  when  v  <  0.   If  we 
define  H  =v'n  V'   ,  then  (11a)  may  be  rewritten  as 

[v'  +  (m-m')"'H  (m-m')]'2<^^'+''^    .  (lib) 

Provided  v'  >  0,  n  >  0  and  H  is  PDS  this  is  the  kernel  of  the  nondegenerate 

'   u         =V 

Student  density  function  with  parameter  (m' ,  H  ,  v')  as  defined  in  formula  (8-26) 
of  [5]. 
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Proof;   In  (10b)  let  a=(iv-l)  +  |^(r+l)  and  b=|-(v"+r-l)  -  a^{v'+x).      Then  the 
kernel  of  the  unconditional  distribution  of  m  is  proportional  to 


dV   ,  (12) 


,.-1, 


where  R^  =  [v|^  is  PDS).   Then  by  (9g), 

DCmilil!'^  ^'^  "';  n)  a^|n^(m-m')(m-m')'^+V'  I"''  =  (l+(m-m' ) ''n^y' "■' (m-m' )  )  ""  ^ 
establishing  (11a). 

Alternate  Proof;   Another  way  of  establishing  (11)  is  to  use  the  fact  that  the 
kernel  of  the  marginal  likelihood  of  m  given  for  n  >  0  observations  the  parameter 
in.)    h)  is  ^y  (^)  ^^^    ('^b')  whether  or  not  v  <  0 

^-in(m-^^)  h(m-ji)  |h|i   ^  (4b) 

Furthermore,  conditional  on  h=h,  ^  and  e^  =  m-|i_  are  independent  Normal 
random  vectors;  and  so  m=^i^+6  is  Normal  with  mean  vector 

E(m)  =  E(^)  +  E(e)  =  m'-t£  =  m' 
and  variance-covariance  matrix 

V(m)  =  V(^)  +  V(e)  =  (h  n^)'^ 
Thus  integrating  with  respect  to  h^ 
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p      -|-tr  h{n  (m-m')(m-m')'^+y' 3    |-(v'+l)-l 
D(m|v',  n',  v'j  n,  V)  =  /   e        "  ~    |h|         dh 

As  the  integrand  in  the  above  integral  is  the  kernel  of  a  Wishart  density  with 
parameter  (n  (m-m' ) (m-m' )  +y' ,  v'+l)^ 

D(m|V',  n',  v'}  n,  v)  «  |  n  (m-m' )  (m-m' )'^+V' | '^^^  "^^^ 
and  (11)  follows  directly. 
2.5  Unconditional  Distribution  of  V  When  v  >  0 


The  kernel  of  the  unconditional  distribution  of  V  can  be  found  by  integrating 
D(m,  V|m'^  V' ,  n',  v'j  n,  v)  over  the  range  of  m,  or  by  integrating  the  product  of 
the  kernel  of  the  marginal  likelihood  (4c)  of  V  and  the  kernel  of  the  distribution 
(5a)  of  (^,    h)  with  parameter  (m' ,  V' ,  n',  v')  over  the  range  of  (^,   K) .  We  show 
by  the  former  method  that  for  a  >  ^(r-1)  and  b  >  |^(r-l),  when  v  >  0  and  V  is  PDS 

|„,a-i(r+l) 
D(V|m',  V',  n';  n)  CC  lU (16a) 

IY'+vI^"^ 

where  a=i(v+r-l)  and  b=4(v'+r-l).  Letting  K  be  a  non-singular  upper  triangular 
matrix  such  that  K  K  =V'  we  may  make  the  integrand  transform  K  Z  K  =V  and  write 
(16a)  as 

D(||m',  y',  n'j  n)ot  1=J .  (16b) 

Formula  (16b)  is  the  kernel  of  a  standardized  inverted  generalized  Beta  function 
with  parameter  (a,  b). 

Proof:   From  (10b) 

D(m,  Y|m',  r,   n';  n,  v)  OC  IyF^"^  |  V'+V+n^(m-m' )  (m-m' )  |  "2^^  +^"^> 
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Conditional  on  V=y,  the  second  determinant  is  the  kernel  of  a  Student  density  of 
m  with  parameter  (m,  n  (v"-l) [ V'+V] ,  v"-l) .   That  part  of  the  constant  which 
normalizes  this  Student  density  and  which  involves  V  is  |y'+y|  ^        ;  hence 
(16a)  follows. 

Now  since  V'  is  PDS  there  is  a  non-singular  triangular  matrix  K  of  order  r 
such  that  K  K  =V' .   If  we  make  the  integrand  transform  ^  Z  K*'=V' ,  and  let  J(Z,  V) 
denote  the  Jacobian  of  the  transform,  the  transformed  kernel  may  then  be  written  as 

|KZ  kY'^^""""^^  t  -b-irr+n  Izl^-^^'^+l) 

J(Z,  V)  =  |K  K^^l  ^  2(^^+1) j(z^  y)  .  \l\ 


I  ^  K^+K  Z  K*^  I  ^"^      "   "     '  ~  I  I+Z  i  ^^ 

Since  J(|,  p  =  l^l'^''"^  =  |K  K^^ia^'^+l)  =  |v'|2^^"^^\  we  may  write  the  transformed 
kernel  as  shown  in  (16b). 

3.   Preposterior  Analysis  with  Fixed  n  >  0 

We  assume  that  a  sample  of  fixed  size  n  >  0  is  to  be  drawn  from  an 
r-dimensional  Multinormal  process  whose  mean  vector  ^   and  matrix  precision  h 
are  not  known  with  certainty  but  are  regarded  as  random  variables  {^,    h)  having 
a  prior  Normal-Wishart  distribution  with  parameter  (m' ,  V',  n',  v')  where  n'  >  0, 
v'  >  0,  but  y'  may  or  may  not  be  singular. 

3.1   Joint  Distribution  of  (m",  y") 


The  joint  density  of  (m",  V")  is,  provided  n'  >  0,  v'  >  0,  and  v  >  0  is 

D(m-,  V"|m',  V',  n' ,  v'>  n,  v)  ot  |  r-v' -nMm--m' )  (m"-m- )'^  p^'^   (17a) 

where 

n*=n'n-/n  .                                              ^^^^^ 
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and  the  range  of  (m",  y")  is 

R,  „   „,  =  ((m",  y")|-»<  m"  <  -fw  and  V"-C  is  PDSj   .  (17c) 

Proof:   Following  the  argument  of  subsections  12.1.4  and  12.6.1  of  [5]  we  can 
establish  that 

n  (m-ra')*^h(m-m')  =  n*(m"-m' /h(m"-m' )  (18a) 

where  n*=n'n"/n.   The  same  line  of  argument  allows  us  to  write 

y"=y'+y+n'(m'm')'^  +  n(m  m*^)  -  n"(m"m"'^)  =  V'+V  +  n*(m"-m' )  (m"-m' ) '^.   (18b) 

From  (7)  we  have 

(m,  V)  =  (^  (n"m"-n'm'),  V"-V' -n*(m"-m')  (m"-m' )'^)   . 

Letting  J(m",  V";  m,  V)  denote  the  Jacobian  of  the  integrand  transformation  from 
(m,  y)  to  (m",  y")^  we  make  this  transformation  in  (10b),  obtaining  (17a).   Since 
J(m",  y";  m,  y)  =  J(m",  m)  J(y",  y)  and  both  J(m",  m)  and  J(y",  V)  are  constants 
involving  neither  m"  nor  V",  neither  does  J(m",  V"j  m,  V) .  When  v<0  and  V  is  singular  the 
numerator  in  (17a)  vanishes,  so  that  the  density  exists  only  if  v  >  0.   However 
if  V  >  0,  the  kernel  in  (17a)  exists  even  if  y'  is  singular  (v'*=fl) .   Hence  we 
write  the  kernel  as  shown  in  (17a). 

That  the  range  of  (m",  y")  is  R,  „  y,,,  as  defined  in  (17c)  follows  directly 
from  the  definitions  of  m"  and  y",  of  R^^^  and  R^,  and  of  m'  and  V' . 

3.2   Some  Distributions  of  ra" 


The  unconditional  distribution  of  m"  is  easily  derived  from  the  unconditional 
distribution  (lib)  of  m;   provided  n  >  0,  n'  >  0,  v'  >  0,  and  v'  is  PDS , 

D(m"|m',  V',  n',  v'>  n,  v)  =  fs''^E"lE'^  (""/n)^J^,  v')  (19a) 

where 

H  =  v'n   V^  ,  (19b) 

=V      u  =    ' 
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and  the  conditional  distribution  of  m"  given  V"=V"  is^  provided  v  >  0,  n'  >  0 
v'  >  0, 

D(m"|m',  V',  n',  v'j  n,  v,  V")  =  f^g^E'lE'^  »*.  v)  (20a) 

where 

H^  =  vn*(V"-V')"'^         .  (20b) 

The  right  hand  side  of  (20a)  is  the  inverted  Student  density  function  with 
parameter  (m' ,  H^^  v)  as  defined  in  formula  (8-34)  of  [5]. 

Proof:   Since  m"  =  — „  (n'ra'+n  m)  and  since  from  (lib)  when  n  >  0^  n'  >  0,  v'  >  0, 
V'  is  PDS,  and 

by  Theorem  1  of  subsection  8.3.2  of  [5], 

m"  ~  fg^'^E'lES  (""/n)^H^,  v')   . 
To  prove  (20)  observe  that  the  kernel  of  the  conditional  distribution  of  m" 

given  V"=V"  is  proportional  to  (17a),  and  so 

t   — v-1 
D(m"|ra',  V',  n',  v'}  n,  v,  V")  CC  |  V"-V' -n*(m"-m' )  (m"-m' )  "- 1  ^ 


Since 


V"  -  V'  =  y  +  n*(m"-m')(m"-m')'^  , 


(V"-V')  will  be  PDS,  as  long  as  v  >  0,  and  so  (y"-V')"   is  also  PDS .  Usirgawell  knovn 
determinental  identity  and     letting  H^  be  as  defined  in  (20b),  when  V"-V' 
is  PDS  we  may  write  the  density  of  m"  as 

[l-n*(ra"-m')''(V"-V'r'^  (m"-m' ^  ]^^'-^  =  v'^^"^^  [  v- (m"-m' )  "1^(11}" -m' )  J^^'-^  , 

— v+1 
which,  aside  from  the  constant  v  ^        ^  is  the  kernel  of  an  inverted  Student 

density  function  with  parameter  (m' ,  H^,  v) . 
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3.3  Analysis  When  n  <  r 


Even  when  n  <  r,  it  is  possible  to  do  Bayesian  inference  on  (y,,  h)  by 
appropriately  structuring  the  prior  so  that  the  posterior  of  {^,   h)  is  non- 
degenerate. 

For  example,  if  the  data  generating  process  is  Multinormal,  if  we  assign  a 

prior  on  (^,    h)  with  parameter  (0,  0,  0,  0),  and  then  observe  a  sample 

X    ,...,x    where  n  <  r,  then  v  <  0  and  the  posterior  parameters  defined  in  (7) 

assume  values 

n"=n    ,        v"=V+r-<l)-l=0    ,        m"=m   ,        Y"*=0   . 

The  posterior  distribution  of  (^,    h)  is  degenerate  under  these  circumstances. 

If,  however^  we  insist  on  assigning  a  very  diffuse  prior  on  (^i,  h) ,  but 

are  willing  to  introduce  just  enough  prior  information  to  make  V"  non-singular 

and  v"  >  0  then  the  posterior  distribution  is  non-degenerate;  e.g.  assign  v'=l, 

y'=M  I,  M»  0,  and  leave  n'=0  and  m'=0,  so  that  v"=l,  y"=y+y' =V+^I-   I"  this 

case  we  have  a  bona  fide  non-degenerate  Normal-Wishart  posterior  distribution 

of  Q,  i). 

Furthermore,  the  unconditional  distribution  of  the  next  sample  observation 
X      exists  and  is,  by  (lib),  multivariate  Student  with  parameter 

In  addition,  for  this  example  the  distribution,  unconditional  as  regards 
(^,  h),  of  the  mean  x  of  the  next  n°  observations  exists  even  though  n  <  r  and 
is  by  (lib)  multivariate  Student  with  parameter  (m,  nu(Y-+MI)   ,  1),  where 
n  =n°n/n°+n.   This  distribution  is,  in  effect,  a  probabilistic  forecast  of  x. 
From  (19a)  it  also  follows  that  the  distribution,  unconditional  as  regards 
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/~   UN    iT  ^u      ^   •         ~.i    •    ^    i_     •     (n+l)       (n+n  )  . 
(^,  h)  ,  of  the  posterior  mean  m"  prior  to  observing  x     >  •  • '  >}i  i! 

multivariate  Student  with  parameter  (m,  (1  H — r)nMI,  1). 

n 
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